Module 8 - Trees

Overview

During this week, we will begin working with highly nonlinear models that stand in stark contrast to linear and logistic regression models, namely decision and regression trees. These models rely on making successive cuts in the data-space, leading to a tree-structure where each “leaf” is a subset of the data-space and is accompanied by a decision rule to either predict a number (regression trees) or class (classification trees). The problem of finding the optimal tree for a given problem is usually computationally intractable, a scenario that is common for many of the most complex and powerful ML algorithms, and methods for finding trees non-deterministic and based on heuristics. We build your intuition for how decision trees work by contrasting them with other models, and we will discover that on their own, single trees make poor models.

Learning Objectives

  • Defining classification and regression trees
  • Contrasting trees with other models
  • Algorithms for trees and key hyperparameters
  • Challenges with using trees for inference

Readings

  • ISLP (Introduction to Statistical Learning with Python): 8.1

Videos